Skip to content

feat(framework): Phase 3 PR 1 — audit prompt templates + output schema#85

Merged
montfort merged 1 commit intomainfrom
feat/phase3-pr1-audit-artifacts
May 3, 2026
Merged

feat(framework): Phase 3 PR 1 — audit prompt templates + output schema#85
montfort merged 1 commit intomainfrom
feat/phase3-pr1-audit-artifacts

Conversation

@montfort
Copy link
Copy Markdown
Contributor

@montfort montfort commented May 3, 2026

Summary

First of 6 PRs implementing Phase 3 (multi-model external audit) + the open frictions F2/F5/F7. Framework-only — no CLI code yet.

What's added

  • dist/.devtrail/audit-prompts/auditor-primary.md — prompt template for the primary auditor.
  • dist/.devtrail/audit-prompts/auditor-secondary.md — prompt template for the secondary auditor (different model family).
  • dist/.devtrail/audit-prompts/calibrator-reconciler.md — prompt template for the third-tier calibrator that reconciles the two auditor outputs.
  • dist/.devtrail/schemas/audit-output.schema.v0.json — JSON Schema Draft 2020-12 with oneOf discriminator on audit_role (auditor outputs vs calibrator output).

Architectural decision A1: orchestration-only

Phase 3 v0 is orchestration-only, not an HTTP-API client. The CLI prepares and persists prompts, awaits the operator's responses, validates outputs against the schema, integrates findings into the Charter telemetry — but does not invoke any LLM API directly.

Rationale:

  • Implementing 3 HTTP clients (OpenAI / Google / Anthropic) is 1-2 weeks + perpetual maintenance when APIs change. Premature for an experimental v0 schema.
  • Sentinel's empirical pattern (the 6-cycle dual-audit experiment that motivated Phase 3) already uses this human-in-the-loop shape via /plan-audit skills. The CLI's value-add is the canon (prompt shape + output schema + telemetry integration), not the API call.
  • Closes RFC RFC: Phase 3 audit visibility — persist resolved prompts + standardize auditor handoff #82 (audit visibility) by design — the prompt-resolution and the auditor's response are both files on disk, version-controlled, inspectable.
  • Aligns with principle fix: improve explore TUI navigation, rendering, and usability #10 (honesty about what the tool does not do): "no LLM gateway, no model evaluation".

Schema design

  • oneOf discriminator on audit_role: three fixed roles, not arbitrary N.
  • findings_by_category enum (hallucination | implementation_gap | real_debt | false_positive) is the same vocabulary used by external_audit in charter-telemetry.schema.v0.json. The audit cycle output integrates directly into Charter telemetry at close.
  • Every output declares prompt_used: <relative path>, satisfying RFC RFC: Phase 3 audit visibility — persist resolved prompts + standardize auditor handoff #82's requirement that the prompt path be discoverable from the output.

Prompt design

  • Primary and secondary prompts are structurally identical. The heterogeneity signal lives in the auditor MODEL (different family per §5.2), not in different prompts. A/B-testing prompt phrasings is forward-looking; v0 keeps them symmetric for clean comparability.
  • Calibrator prompt asks for status assignment (agreed | disputed | unique_primary | unique_secondary | rejected) per finding. Status counts cross-check against body section count.
  • All three include explicit categorization + discipline rules ("don't fabricate findings", "no external sources beyond the prompt").

Test plan

  • JSON Schema is valid (parses with Python json module).
  • No dist-manifest.yml change needed — .devtrail/ is already declared recursively.
  • PR 2 will validate the schema against real auditor outputs in integration tests.
  • PR 6 will smoke-test a full audit cycle in a tempdir.

🤖 Generated with Claude Code

First of 6 PRs implementing Phase 3 (multi-model external audit) +
the open frictions F2/F5/F7. Framework-only — no CLI code yet.

Artifacts (all under dist/.devtrail/, auto-distributed via the existing
recursive manifest pattern):

- audit-prompts/auditor-primary.md
- audit-prompts/auditor-secondary.md
- audit-prompts/calibrator-reconciler.md
- schemas/audit-output.schema.v0.json

Architectural decision A1 (per the Phase 3 plan): Phase 3 v0 is
ORCHESTRATION-ONLY, not an HTTP-API client. The CLI prepares and
persists prompts, awaits the operator's responses, validates outputs
against the schema, integrates findings into the Charter telemetry —
but does NOT invoke any LLM API directly. Adopters paste the resolved
prompts into their auditor of choice (Copilot, Gemini, Claude, etc.),
save responses to the canonical paths, and the CLI consolidates.

Rationale for orchestration-only:
- Implementing 3 HTTP clients (OpenAI / Google / Anthropic) is 1-2
  weeks of work + perpetual maintenance when APIs change. For an
  EXPERIMENTAL v0 schema, that investment is premature.
- Sentinel's empirical pattern (the 6-cycle dual-audit experiment that
  motivated Phase 3) ALREADY uses this human-in-the-loop shape via
  /plan-audit skills. The CLI's value-add is the canon (prompt shape +
  output schema + telemetry integration), not the API call.
- Closes RFC #82 (audit visibility) by design — the prompt-resolution
  and the auditor's response are both files on disk, version-controlled,
  inspectable, and reproducible by hand if the API call fails.
- Aligns with principle #10 (honesty about what the tool does NOT do):
  "no LLM gateway, no model evaluation".

Schema design:
- audit-output.schema.v0.json uses oneOf to distinguish auditor outputs
  (primary/secondary, fresh findings) from calibrator outputs
  (reconciliation across the two). The `audit_role` field is the
  discriminator — three fixed roles, not arbitrary N.
- findings_by_category enum (hallucination | implementation_gap |
  real_debt | false_positive) is the same vocabulary used by the
  external_audit array in charter-telemetry.schema.v0.json. The audit
  cycle output integrates directly into Charter telemetry at close.
- Every output declares prompt_used: <relative path>, satisfying RFC
  #82's requirement that the prompt path be discoverable from the output.

Prompt design:
- Primary and secondary prompts are STRUCTURALLY IDENTICAL. The
  heterogeneity signal lives in the auditor MODEL (different family
  per §5.2), not in different prompts. A/B-testing prompt phrasings
  is forward-looking; v0 keeps them symmetric for clean comparability.
- Calibrator prompt assumes both auditor outputs as context and asks
  for status assignment (agreed | disputed | unique_primary |
  unique_secondary | rejected) per finding. Status counts cross-check
  against body section count — the schema enforces consistency.
- All three prompts include explicit categorization rules + discipline
  rules ("don't fabricate findings", "no external sources beyond the
  prompt"). The rules are duplicated across the three so the auditor
  doesn't need to consult external documentation.

What's NOT in this PR:
- No CLI code yet — the `devtrail charter audit` command lands in PR 2.
- No heterogeneity validation (`--implementer-family` enforcement) — v1.
- No invocation of LLM APIs — orchestration-only by design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@montfort montfort merged commit 7f15541 into main May 3, 2026
@montfort montfort deleted the feat/phase3-pr1-audit-artifacts branch May 3, 2026 06:05
montfort added a commit that referenced this pull request May 3, 2026
…n) (#86)

Second of 6 PRs implementing Phase 3 + open frictions. The CLI command
that orchestrates the dual-audit + calibrator cycle, using the prompt
templates and output schema shipped in PR 1 (#85).

Architecture A1 (orchestration-only) means the CLI does NOT invoke
LLM APIs. The operator pastes resolved prompts into their auditor of
choice (Copilot, Gemini, Claude, etc.) and saves responses to canonical
paths under audit/charters/<CHARTER-ID>/. The CLI's value is structure
(prompt resolution + output schema validation + telemetry-ready YAML),
not invocation.

Three steps, each invokable independently:

  $ devtrail charter audit CHARTER-01
    Step 1/3: PREPARE
    Resolves auditor-primary.prompt.md and auditor-secondary.prompt.md
    against the Charter content + git diff + originating AILOGs, writes
    to audit/charters/CHARTER-01/prompts/.

  $ devtrail charter audit CHARTER-01 --calibrate
    Step 2/3: CALIBRATE
    Validates the two auditor responses against audit-output.schema.v0.json,
    resolves the calibrator-reconciler prompt with both responses
    embedded as context.

  $ devtrail charter audit CHARTER-01 --finalize
    Step 3/3: FINALIZE
    Validates all 3 outputs (auditor-primary + auditor-secondary +
    calibrator), prints a YAML-formatted external_audit array block
    ready to paste into the Charter telemetry, and points to the
    calibrator's reconciliation summary for outcome.scope_change_notes.

Each step is a filesystem mutation. Files persist between steps —
operator can run prepare, walk away, come back days later, run
calibrate. Each step prints clear next-action guidance pointing to
the exact paths involved.

Per RFC #82 the resolved prompt is persisted BEFORE any external
action. The schema's prompt_used field cites which prompt template
was used; the calibrator can verify provenance.

Module shape:
- src/audit_schema.rs: jsonschema wrapper with oneOf-aware error
  formatting, mirroring telemetry_schema.rs and charter_schema.rs.
- src/commands/charter/audit.rs: 3-step run dispatch, template
  resolution with placeholder substitution, frontmatter parsing for
  auditor summaries, external_audit YAML rendering.

Placeholders supported in templates:
  {{charter_id}}, {{charter_title}}, {{charter_path}},
  {{charter_content}}, {{git_range}}, {{git_diff}},
  {{ailog_paths}}, {{ailog_contents}}, {{audit_role}},
  {{schema_path}}, {{auditor_primary_findings}},
  {{auditor_secondary_findings}}.

Unknown placeholders are left as literals (no surprise mutations).

Tests:
- 5 unit tests in src/audit_schema.rs (auditor vs calibrator oneOf
  discriminator, charter_id pattern, auditors_reconciled minItems).
- 5 unit tests in src/commands/charter/audit.rs (canonical_id,
  template substitution, frontmatter parsing, AuditorSummary).
- 7 integration tests in cli/tests/charter_audit_test.rs covering
  all three steps + error paths (devtrail-not-installed, unknown
  charter, calibrate-without-auditor-outputs, schema validation
  failure, full cycle, mutually-exclusive flags).

400/400 tests pass.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant